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ABSTRACT 



High stakes assessment involves testing students for 
purposes such as grade level retention or advancement, high school 
graduation, selection for special programs or services, or for other "high 
stakes" consequences. Issues surrounding the high stakes assessment of 
English language learners (ELLs) were the focus of an August 1997 
invitational symposium sponsored by the Office of Bilingual Education and 
Minority Languages Affairs (Department of Education) , whose proceedings are 
summarized here. The report addresses three central questions, describes the 
symposium discussion on each, and presents research recommendations arising 
from the discussion. The questions include: at what point does testing a 
child in a second language yield meaningful results?; What accommodations are 
appropriate for testing ELLs?; and What is the role native language 
assessment plays in high stakes testing? A list of participants is appended. 
(MSE) 
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Introduction 



High stakes assessment involves testing students for purposes such as grade level retention 
or advancement, high school graduation, selection for special programs or services, 
or for other “high stakes” consequences. State-wide performance assessments, standards-based tests, 
and other assessments used to determine the placement or type of educational program a student 
should receive are examples of high stakes assessments. Tests that might be used to determine the 
type of high school graduation certificate or diploma are also considered high stakes assessments. 

Issues surrounding the high stakes assessment of English language learners (ELLs) were the 
focus of an invitational symposium sponsored by the U.S. Department of Education, Office of 
Bilingual Education and Minority Languages Affairs (OBEMLA). The symposium was held August 
26-27, 1997 in Washington, D.C. 

The need for the symposium grows out of concerns raised by educators and policy makers 
alike about how to ensure appropriate and equitable inclusion of ELLs in high stakes assessments. 
In many states and local school districts ELLs are routinely excluded from participating in such 
assessment activities. In others, ELLs are inappropriately included in the testing programs without 
adequate accommodations that take into account the level of English language fluency the students 
bring with them to the testing situation. The President’s recent proposal to offer a national test and 
the many statewide standards-based assessments in preparation provided additional urgency to hold 
such a symposium. 

Other reasons for the symposium relate to the requirements for assessing ELLs contained 
within Title I and Title VII of the Improving America 's Schools Act of 1994. Specifically, for those 
ELLs participating in Title I, the legislation calls for their assessment to the extent practicable and in 
a manner that yields the most accurate results. In order to meet eligibility requirements for Title I 
funding, states must have their accountability and testing systems in place by the year 2000. These 
systems must include the assessment needs of ELLs. Furthermore, new Title VII evaluation 
requirements state that districts receiving Title VII funds provide information to the U.S. Department 
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of Education every two years concerning ELLs’ progress in English proficiency, mathematics, 
language arts, and reading in both English and their native language. 

Finally, there is the larger issue of helping the public at large, The Congress, and state and 
local policy makers better understand the progress ELLs make in our schools and the difficulties faced 
in fairly and appropriately assessing their academic development. Poor policy decisions have resulted 
from a misunderstanding of the education of ELLs and from an inability to show student progress. 
A research agenda is needed to direct resources and attention to the critical questions of when and 
how to appropriately assess ELLs. 

Critical questions of national importance regarding ELLs and assessment: 

1. At what point docs testing a child in the second language yield meaningful 
results? 

2. What accommodations are appropriate for testing ELLs? 

3. What role docs native language assessment play in high stakes testing? 

Symposium Participants 

In order to address each of these questions, 52 participants from diverse areas of the education 
community were identified and invited to participate in the two-day symposium. The participants 
represented a broad cross section of stakeholders concerned with the education of ELLs in this nation. 
Administrators of district, state, and federal programs for ELLs were involved, as well as directors 
of research institutes and information centers, professional education associations, technical 
assistance centers, civil rights/advocacy groups and independent consultants. Several representatives 
from OBEMLA participated in the conference, including its Director, Delia Pompa. Administrators 
of related federal programs also were invited, including representatives of U.S. Department of 
Education’s Title I programs and the Office of Educational Research and Improvement (OERI). In 
addition, representatives of influential education organizations attended, including the Chief State 
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Schools Officers, Educational Testing Service (ETS), and the American Educational Research 
Association (AERA). A complete list of participants is contained in Appendix A. 

Building Consensus 

Prior to attending the symposium, participants were asked to consider assessment practices 
in their districts or states, test development efforts underway, as well as existing research related to: 
students’ readiness for English language testing; appropriate testing accommodations for ELLs and 
the consequences of exclusion; and the role native language assessment could play in high stakes 
testing and assessment. Responses to these issues served as a survey of current practice which was 
shared with the participants on the first day of the symposium. 

Participants were assigned to one of three panels corresponding to the three questions posed 
in the Introduction above. Facilitators for the three panels were Shelly Spiegel-Coleman, an English 
as a second language consultant at the Los Angeles County Office of Education; Diane August, a 
researcher and consultant on the education of English language learners; and Cecilia Navarrete, 
consultant and adjunct Associate Professor at New Mexico Highlands University. 

Each panel was charged with the task of creating a specific, targeted research agenda 
pertaining to one of the questions cited above. In order to define a research agenda, the participants 
met in their respective panels to first define the issue(s) and their implications for inclusion of ELLs 
in high stakes assessments. A key part of this process was articulating a set of researchable sub- 
questions that would help to explicate or further clarify the larger question to be addressed by the 
panel. After sub-questions and issues were identified and discussed, participants prioritized their 
questions, selecting 10 to 15 fundamental or primary questions that need to be answered. Part of the 
task also involved separating out policy issues from needs for research, a task not easily achieved 
since many research questions about inclusion of ELLs in assessments are linked to matters of policy. 
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Next, the panel participants selected what they believed to be the five most important sub- 
questions, a process that helped to prioritize the research needs related to each of the symposium’s 
three broad research questions. These questions were then shared with all participants and discussed 
in a plenary session, which brought together all three panels. 

The final step in the conference involved developing a research plan to address each of the 
three symposium questions. Meeting in the three panels, participants identified existing research, 
research underway, and data bases that could be used to begin answering research questions posed 
by the three panels. Participants also identified researchers, organizations, school districts, and other 
groups who collect data that might be reanalyzed or used to answer questions raised by the panelists. 
Additionally, panelists identified organizations with whom partnerships might be formed to fund and 
support research on assessment of ELLs, including organizations such as the Council of Chief State 
School Officers, AERA, ETS, and government agencies, such as the U.S. Department of Education’s 
Office of Educational Research and Improvement. The results of these deliberations in each panel 
were subsequently presented to the entire participant group in a plenary session on the second day of 
the symposium. 



Building a Research Agenda 



The following sections summarize the participants’ proposed research agenda for the three 
fundamental questions concerning the high stakes assessment of ELLs. 

At What Point Does Testing a Child in a Second Language Yield Meaningful 
Results? 

Those responsible for outlining a research agenda for this question began their inquiry by 
restating the question. This restatement was necessary in order to capture the difficulty in determining 
the precise point when all students will be ready for such testing. The panel’s revised question was: 
“When is there a higher probability that the results from testing in the second language are valid?” 

The panelists agreed that research is needed to determine the point in time along the 
continuum of developing English proficiency an ELL can take a high stakes test in English and have 
the results reflect an accurate picture of achievement beyond a score that purely reflects chance. In 
order to fully explore this question, the panel recommended the use of multiple sources of data. The 
first task would be to review existing data bases at both the district and state levels. Questions such 
data might answer include: 

♦ What kinds of language proficiency assessments are used in both the native language 
and in English? 

♦ What different programs and instructional practices are available to ELLs? 

♦ What are the students’ background characteristics and their results from high stakes 
testing? 

The panel also recommended conducting survey research to more accurately describe the 
background characteristics of ELLs. Data on language proficiency (listening, speaking, reading and 
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writing), coupled with data describing program services, such as length of time in the program, first 
language literacy, and program type, could be used to assist administrators and policy makers create 
guidelines for including ELLs in high stakes assessment. 

In order to more clearly define these conditions, research needs to be conducted to develop 
profiles of subgroups of ELLs along a second language acquisition continuum, beginning with 
students who are pre-literate in both English and their native language. The second group to be 
profiled should include students who are literate in their native language, but who have had no 
exposure to English. A third subgroup should include students who are at the point where English 
language testing would yield valid results. Information from these profiles could then be used to 
determine what constellations of characteristics of ELLs create salient profiles that lead to valid 
results on a high stakes test. 

A third issue meriting research is determining when pre-literate students with limited or 
interrupted schooling could be expected to reach established standards of the district and state. 
Research has indicated that for pre-literate students, especially older students, the amount of time 
necessary to reach the point at which English language testing would yield valid results is longer than 
for other ELLs. The panel recommended that valid diagnostic achievement assessments for pre- 
literate students need to be identified. In addition, research needs to be undertaken to determine how 
many of these students there are and where they are located. 

A fourth issue this panel recommended for examination concerns the effects cultural bias, 
bilinguality, and different regional/social varieties of English have on test validity and student 
performance. Research also is needed on the effect test item and response format have on test validity 
and student performance. 

A final recommendation included the examination of existing data and research literature to 
develop “think pieces!” Such think pieces would complement and support the research. One 
proposed think piece would examine the history of high stakes testing as it relates to the treatment 
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of English language learners, including who is exempted, and the criteria for exemption. Another 
needed think piece involves determining what exactly is tested in high stakes assessment and the 
purposes of such assessment. Other areas might address the implications of high stakes testing for 
legislation, for school districts, and for students, and the future direction of such testing in terms of 
national tests and state graduation requirements. In addition the implications of such testing for 
instructional issues and classroom practices should be examined. 

Research Recommendations: 



1 . At what point along the language proficiency continuum does performance on 
a high stakes test yield valid results above chance? Is this point a function of 
time? 

2. What are the constellations of characteristics of limited English proficient students 
that create a series of student profiles that would lead to valid results on a high stakes 
test? 

3. At what point can preliterate limited English proficient students with limited and/or 
interrupted schooling be expected to reach established standards of the district and 
state? 

4. What are the effects on test validity and student performance of: (a) cultural bias 
(background knowledge, world view, etc.); (b) bilinguality (proficiency in each 
language); (c) item and response format; and (d) different regional/social varieties of 
English. 





What Accommodations are Appropriate for Testing ELLs? 



The panel discussed the issue of whether the purpose of high stakes assessment is to do the 
best job possible of measuring what ELLs know and can do or whether it is to determine how they 
compare with their fully English-proficient peers. Different goals presuppose entirely different 
methods of incorporating these students into high stakes testing. If the purpose is to find out how they 
compare with their peers, one might provide no modifications at all. If, on the other hand, the purpose 
is to find out as much as possible about their knowledge and skills, one might provide as many 
modifications as necessary. Between these two extremes, there is a continuum of options that entail 
compromises. 

The panel formulated three principles that should undergird any recommendations for 
accommodating ELLs in high stakes assessments. The first is that ELLs should be included in 
assessment systems for accountability purposes. Inclusion can take two forms — full inclusion in 
which ELLs are given the same assessments as their fully English proficient peers or partial inclusion 
in which they take a standard assessment with accommodations or an alternative assessment. The 
second principle is that accommodation should be applied to a broad range of activities including test 
development, test preparation of students, test administration, student response modes, scoring, 
benchmarking results, and reporting student outcomes. Third, assessments should mainly be used to 
help educators improve instruction. 

In order to develop a research agenda inclusive of all possible accommodations, the panel 
identified possible methods of incorporation. From the onset, high stakes assessments should be 
developed with ELLs in mind. They should be considered in the development of the test construct, 
framework, and individual items, and they should be included in sufficient numbers in the sample 
used to norm the assessments. Prior to the administration of assessments, ELLs should be provided 
with a review of the content to be covered in the assessment and receive practice and coaching with 
the test format. 
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Strategies to include ELLs in assessment and accountability systems, when they are unable 
to take the standard version of the test, might include the use of native language assessments, 
bilingual versions of the assessment, alternative modes of response, and portfolios of their work. 
Teacher judgements of student work might serve as alternatives to taking the tests. 

During test administrations, modifications might also be made. Procedures currently in use 
that need further development and evaluation include: extra time, the use of glossaries or dictionaries, 
reading the directions aloud in English and/or the native language, repeating the instructions, 
simplifying the instructions, providing a test administrator familiar to the students, providing small 
group or individual administrations of the assessment, and providing for multiple testing sessions. 

Another issue in some types of high stakes assessment of subject matter knowledge of ELLs 
is the errors that result from inaccurate and inconsistent scoring of open-ended or performance-based 
measures. The development of scoring rubrics and procedures for constructed response items that 
are sensitive to the language and cultural characteristics of ELLs is needed. The Council of Chief 
State School Officers recently developed a Scorer’s Training Manual to be used by states and school 
districts to aid in the scoring of ELLs’ answers to open-ended mathematics questions. This manual 
will be piloted with ELLs who participated in the 1996 National Assessment of Educational Progress 
math test. 

An additional issue is the need for benchmarks to determine when ELLs have attained those 
precursor skills and knowledge already possessed by students who arrive in school speaking English. 
Because of the difficulties in assessing ELLs, it may be important to assess their access to necessary 
resources and conditions, such as adequate and appropriate instruction. 
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Research Recommendations 



There is a critical need for research to determine how to best assess first and second language 
development and literacy. To assess language proficiency appropriately, both discrete language skills 
(e.g., vocabulary, grammar, etc.) as well as more authentic and holistic uses of language should be 
assessed. 

The second area of need is to determine when ELLs should take the same subject area 
assessments as fully English proficient students; when they should take an accommodated version; 
when they should take an alternative assessment; and when alternative procedures such as teacher 
judgement or score prediction should be used. Panelists recommended that a large-scale survey of 
current practices be conducted. 

A third suggestion for research involves how to effectively accommodate ELLs in high stakes 
assessments. The panel recommended that several studies be undertaken to address this question. 
A large scale survey needs to be conducted of state and district practices. Once promising 
accommodations have been identified, studies need to be initiated to determine their effectiveness. 
It is important to determine whether the accommodation(s) improve student performance and how 
the improvement of ELLs compares with improvement for fully English proficient peers. Studies are 
needed to compare the performance of English proficient peers and ELLs on both the standard and 
accommodated versions of the assessment. If performance improves for all, the assessment might 
be considered a better measure. If performance is improved for ELLs only, a validity study should 
be conducted to determine how ELL student performance on the modified assessment compares with 
actual classroom performance. 

Research is needed to develop rubrics and scoring procedures that accurately measure student 
performance. Rubrics that distinguish between errors due to language proficiency and those related 
to lack of content knowledge and skills need to be developed. Additionally, methods to train scorers 
are necessary, since without training, scorers have been found to rate the same work very differently. 
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Research is also needed to determine the best methods of reporting and interpreting scores for the 
school and community, including how to format the information so it is comprehensible for different 
audiences and how to explain the accommodations. Along these lines, research is needed to 
determine the credibility of the accommodations for different audiences. 

Because of the difficulties in assessing English language learners, it may be important to 
assess their access to necessary resources and conditions, such as adequate and appropriate 
instruction. Although there has been substantial work in defining some conditions, such as content 
coverage and time on task for mainstream students, the research base for defining the most important 
and effective resources and conditions for English language learners is weak. 



What Role Does Native Language Assessment Play in High Stakes Testing? 



The panel proposed three areas of interest under which specific research questions were 
categorized. These include: utilization of assessment investigations; technical investigations; and cost 
benefits/policy investigations. 

Under the category utilization of assessment, research should be undertaken to determine 
under what circumstances ELLs should take native language assessments and what realistic level of 
first and second language proficiency should determine readiness for high stakes testing. Research 
should also examine when ELLs are prepared to take such tests. Questions here concern what types 
of test taking skills are needed and what kinds of learning opportunities are necessary in order to be 
ready to participate in such assessments. 

Recommended technical investigations include identifying or developing methods of 
preparing equivalent versions/forms for tests in more than one language, determining how such 
versions will be normed or scaled, and identifying what cultural and item biases impact student 
performance on native language assessments. Additionally, research needs to examine how to devise 
native language assessments that yield comparable results to English high stakes assessments. A final 
research area for this category involves an examination of the extent to which native language 
proficiency and literacy are factors in high stakes assessment outcomes. 

Under the rubric of cost benefits/policy investigations, research is needed to determine the 
benefit of conducting native language assessment for different stakeholder groups. A cost benefits 
issue that needs to be pursued involves knowing when it is practicable or necessary to administer or 
develop native language assessments. Another research question concerns whether “high stakes” are 
the same when testing in the native language as opposed to testing in English. 

The panel also recommended several additional areas for investigation. Since many local 
schools and state departments of education are in the process of using and/or designing tests, the 
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panel suggested examining the challenges and attempts test developers have experienced in preparing 
native language assessments. A survey of the results obtained from these assessments and lessons 
learned would be useful in directing and refining further research. Other areas in need of research 
attention include identifying what types of education programs and contexts are necessary for native 
language testing and what criteria should inform the development and/or selection of a native 
language assessment for high stakes purposes. 

Research Recommendations: 



1 . What are the consequences (intended/unintended) of using the first language (LI ) or 
the second language (L2) in high stakes testing? 

2. What is the relationship between opportunity-to-learn in LI and L2 and performance 
in high stakes testing in L1/L2? 

3. What are the appropriate proficiency levels of LI and L2 necessary for high stakes 
testing in LI and L2? 

4. How do various native language accommodations affect performance on high stakes 
testing for students at different levels of native language proficiency (e.g., dictionaries, 
bilingual forms, oral native language instruction, relaxed time limits). 

5. Are there conflicting policies relating to high stakes testing for English language 
learners at the state/district level? 



Future Directions 



As described above, establishing a research agenda for the high stakes assessment of English 
language learners poses many challenges in light of the critical issues that must also be addressed. The 
comprehensive and forward-looking directive suggested by the symposium participants can be used 
by OBEMLA to guide both the gathering of existing research and the designing of future research 
projects. In initiating collaborative research efforts with U.S. Department of Education entities such 
as the Office for Educational Research and Improvement and the Office of Elementary and Secondary 
Education, with state education agencies, various research institutes and professional associations, 
OBEMLA can use this document as a basis for designing research plans based on the questions and 
issues raised in this symposium. 
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